Crowdsourcing the General Public for Large Scale Molecular Pathology Studies in Cancer

نویسندگان

  • Francisco J. Candido dos Reis
  • Stuart Lynn
  • H. Raza Ali
  • Diana Eccles
  • Andrew Hanby
  • Elena Provenzano
  • Carlos Caldas
  • William J. Howat
  • Leigh-Anne McDuffus
  • Bin Liu
  • Frances Daley
  • Penny Coulson
  • Rupesh J. Vyas
  • Leslie M. Harris
  • Joanna M. Owens
  • Amy F.M. Carton
  • Janette P. McQuillan
  • Andy M. Paterson
  • Zohra Hirji
  • Sarah K. Christie
  • Amber R. Holmes
  • Marjanka K. Schmidt
  • Montserrat Garcia-Closas
  • Douglas F. Easton
  • Manjeet K. Bolla
  • Qin Wang
  • Javier Benitez
  • Roger L. Milne
  • Arto Mannermaa
  • Fergus Couch
  • Peter Devilee
  • Robert A.E.M. Tollenaar
  • Caroline Seynaeve
  • Angela Cox
  • Simon S. Cross
  • Fiona M. Blows
  • Joyce Sanders
  • Renate de Groot
  • Jonine Figueroa
  • Mark Sherman
  • Maartje Hooning
  • Hermann Brenner
  • Bernd Holleczek
  • Christa Stegmaier
  • Chris Lintott
  • Paul D.P. Pharoah
چکیده

BACKGROUND Citizen science, scientific research conducted by non-specialists, has the potential to facilitate biomedical research using available large-scale data, however validating the results is challenging. The Cell Slider is a citizen science project that intends to share images from tumors with the general public, enabling them to score tumor markers independently through an internet-based interface. METHODS From October 2012 to June 2014, 98,293 Citizen Scientists accessed the Cell Slider web page and scored 180,172 sub-images derived from images of 12,326 tissue microarray cores labeled for estrogen receptor (ER). We evaluated the accuracy of Citizen Scientist's ER classification, and the association between ER status and prognosis by comparing their test performance against trained pathologists. FINDINGS The area under ROC curve was 0.95 (95% CI 0.94 to 0.96) for cancer cell identification and 0.97 (95% CI 0.96 to 0.97) for ER status. ER positive tumors scored by Citizen Scientists were associated with survival in a similar way to that scored by trained pathologists. Survival probability at 15 years were 0.78 (95% CI 0.76 to 0.80) for ER-positive and 0.72 (95% CI 0.68 to 0.77) for ER-negative tumors based on Citizen Scientists classification. Based on pathologist classification, survival probability was 0.79 (95% CI 0.77 to 0.81) for ER-positive and 0.71 (95% CI 0.67 to 0.74) for ER-negative tumors. The hazard ratio for death was 0.26 (95% CI 0.18 to 0.37) at diagnosis and became greater than one after 6.5 years of follow-up for ER scored by Citizen Scientists, and 0.24 (95% CI 0.18 to 0.33) at diagnosis increasing thereafter to one after 6.7 (95% CI 4.1 to 10.9) years of follow-up for ER scored by pathologists. INTERPRETATION Crowdsourcing of the general public to classify cancer pathology data for research is viable, engages the public and provides accurate ER data. Crowdsourced classification of research data may offer a valid solution to problems of throughput requiring human input.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...

متن کامل

Crowdsourcing scoring of immunohistochemistry images: Evaluating Performance of the Crowd and an Automated Computational Method

The assessment of protein expression in immunohistochemistry (IHC) images provides important diagnostic, prognostic and predictive information for guiding cancer diagnosis and therapy. Manual scoring of IHC images represents a logistical challenge, as the process is labor intensive and time consuming. Since the last decade, computational methods have been developed to enable the application of ...

متن کامل

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

Prevalence of MPL (W515K/L) Mutations in Patients with Negative-JAK2 (V617F) Myeloproliferative Neoplasm in North-East of Iran

Background and Objective: Janus kinase 2 (JAK2) and Myeloproliferative Leukemia (MPL) mutations are confirmatory indicators for Myeloproliferative Neoplasm (MPN). The current study was performed to determine the frequency of MPL mutation in MPN patients without JAK2 mutation, in order to assign MPL mutation frequency in North-East of Iran.Methods: Total o...

متن کامل

While Urine and Plasma Decorin Remain Unchanged in Prostate Cancer, Prostatic Tissue Decorin Has a Prognostic Value

Background: Numerous studies confirmed that significant decrease in tissue decorin (DCN) expression is associated to tumor progression and metastasis in certain types of cancer including prostate cancer (PC). However, the potential prognostic value of tissue DCN in PC has not yet been investigated. Methods: A total number of 40 PC and 42 patients with benign prostatic hyperplasia (BPH) were inv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2015